Parallel Incomplete-LU and Cholesky Factorization in the Preconditioned Iterative Methods on the GPU
نویسنده
چکیده
A novel algorithm for computing the incomplete-LU and Cholesky factorization with 0 fill-in on a graphics processing unit (GPU) is proposed. It implements the incomplete factorization of the given matrix in two phases. First, the symbolic analysis phase builds a dependency graph based on the matrix sparsity pattern and groups the independent rows into levels. Second, the numerical factorization phase obtains the resulting lower and upper sparse triangular factors by iterating sequentially across the constructed levels. The Gaussian elimination of the elements below the main diagonal in the rows corresponding to each single level is performed in parallel. The numerical experiments are also presented and it is shown that the numerical factorization phase can achieve on average more than 2.8× speedup over MKL, while the incomplete-LU and Cholesky preconditioned iterative methods can achieve an average of 2× speedup on GPU over their CPU implementation.
منابع مشابه
An Improved Implementation of Preconditioned Conjugate Gradient Method on GPU
An improved implementation of the Preconditioned Conjugate Gradient method on GPU using CUDA (Compute Unified Device Architecture) is proposed. It aims to solving the Poisson equation arising in liquid animation with high efficiency. We consider the features of the linear system obtained from the Poisson equation and propose an optimization method to solve it. First, a novel storage format call...
متن کاملA Parallel Preconditioned Conjugate Gradient Package for Solving Sparse Linear Systems on a Cray Y-mp *
In this paper we discuss current activities at Cray Research to develop general-purpose, production-quality software for the eecient solution of sparse linear systems. In particular, we discuss our development of a package of iterative methods that includes Conjugate Gradient and related methods (GMRES, ORTHOMIN and others) along with several preconditioners (incomplete Cholesky and LU factoriz...
متن کاملModified Incomplete Cholesky Preconditioned Conjugate Gradient Algorithm on GPU for the 3D Parabolic Equation
In this study, for solving the three-dimensional partial differential equation ut = uxx + uyy + uzz, an efficient parallel method based on the modified incomplete Cholesky preconditioned conjugate gradient algorithm (MICPCGA) on the GPU is presented. In our proposed method, for this case, we overcome the drawbacks that the MIC preconditioner is generally difficult to be parallelized on the GPU ...
متن کاملParallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU
A novel algorithm for solving in parallel a sparse triangular linear system on a graphical processing unit is proposed. It implements the solution of the triangular system in two phases. First, the analysis phase builds a dependency graph based on the matrix sparsity pattern and groups the independent rows into levels. Second, the solve phase obtains the full solution by iterating sequentially ...
متن کاملParallel multilevel iterative linear solvers with unstructured adaptive grids for simulations in earth science
In many large-scale scientific simulation codes, the majority of computation is devoted to linear solvers. Preconditioned Krylov iterative solver such as conjugate gradient method with incomplete Cholesky factorization preconditioning (ICCG) provides robust convergence for a wide range of scientific applications. Incomplete Cholesky (IC) and incomplete LU (ILU) factorizations involve globally d...
متن کامل